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1. Introduction 

Suppose we have the ingredients for a proper Bayesian analysis. For this we 
observe data x from a statistical model {fg : 9 £ 0} , where fg is a density with 
respect to support measure fi on the sample space X, and we have a proper 
prior density tt on 9, with respect to support measure v on Q. With these 
ingredients we have available the joint distribution of (9,X), as given by the 
density fg (x) n (9) with respect to support measure v x and the observed 
value x. We denote the prior predictive measure of X by M(B) = En{Pg{B)) 
and the posterior measure of 9 by 11(^4 | x). For a quantity of interest r = T(9), 
taking values in a set T, we denote the marginal posterior and prior measures of 
T by IIx(- 1 x) and IT-f respectively, with corresponding densities 7Tx( - 1 x) and 
7Tx , taken with respect to a support measure vq- on T. 

Bayes theorem, or the principle of conditional probability, says that any prob- 
ability statements about the unknown 9, after observing x, should be based on 
the posterior \x). These ingredients alone however, do not prescribe what 
7-credible region B 7 (x) C T we should quote for r = T(9). Since there are 
typically many subsets of T containing 7 of the posterior probability, we need 
a rule for choosing among them. 

Relative surprise credible regions for r, as discussed in Evans (1997), are 
based on a particular approach to assessing a hypothesis Hq : t = tq. For this 
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we compute the observed relative surprise (ORS) given by 



We see that (1) compares the relative increase in belief for To, from a priori to 
a posteriori, with this increase for each of the other possible values in T. Other 
approaches to measuring surprise are discussed in Good (1988). For estimation, 
we consider (1) as a function of tq and select a value which minimizes this 
quantity as the estimate, called the least relative surprise estimate (LRSE). To 
obtain a 7-credible region for r we simply invert (1) in the standard way to 
obtain the ^-relative surprise region 



One virtue of relative surprise inferences is that they are invariant under repa- 
rameterizations . 

In Evans, Guttman and Swartz (2006) it was shown that relative surprise 
inferences possess an optimal property in the class of Bayesian inferences. In that 
development, (2) was taken as the basic concept. In particular, if we consider the 
class of all 7-credible regions for r = T(0), then the 7-relative surprise region for 
t has the smallest prior content among all 7-credible regions for this quantity. 
Hypothesis assessments and estimates are derived from relative surprise regions 
in a direct way and so also possess optimal properties. The LRSE is obtained by 
taking the region with 7 = and the ORS is obtained as inf{7 : tq £ C 1 {x)}. In 
section 2 we show that this optimal property has a direct interpretation in terms 
of minimizing the prior probability of covering a false value and argue that this 
is an appropriate way to assess repeated sampling properties in contexts where 
we have a proper prior. In section 3 we prove that, for relative surprise regions, 
the prior probability of covering a false value is always bounded above by the 
prior probability of covering the true value and so such sets are, in a generalized 
sense, unbiased. 

As discussed in Evans and Zou (2002) and Evans, Guttman and Swartz (2006), 
there is a close connection between relative surprise inferences and Bayes fac- 
tors. In section 3 we establish some results that deepen this connection and 
show that relative surprise inferences lead to optimal results when interpreted 
in terms of Bayes factors. In particular, we prove that a 7-relative surprise region 
C 1 (x) for r = T(9) always has a Bayes factor in favor of the region contain- 
ing the true value bounded below by unity and, moreover, the Bayes factor is 
maximized among all 7-credible regions for r by C 7 (x). Further, we introduce 
the relative belief ratio as an alternative method for measuring change in be- 
lief from a priori to a posteriori, and show that this is also bounded below by 
unity for relative surprise regions and that such regions maximize this quantity 
as well. The lower bound can be seen as a natural consistency requirement on 
inferences in the sense that, it would be odd to report a 7-credible region for r 
for which our belief in the set containing the true value declined from a priori to 
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a posteriori. While a decline in belief from a priori to a posteriori makes sense 
for any particular subset of T, it doesn't make sense for our best report for a 
set supposedly containing the true value, as the data are suggesting otherwise. 
Further, the optimality result indicates that we are making the best use of the 
data, from this point-of-view, when we choose to use relative surprise regions. In 
section 4 we show that in quite general circumstances relative surprise regions 
arise as a member of an equivalence class of credible regions under reparame- 
terizations. Further, we show that choosing among these regions is equivalent 
to choosing the measure we use to construct an hpd-likc credible region, where 
hpd stands for highest posterior density. We argue that the most natural choice 
of this measure is IIx, which gives relative surprise regions. 

2. Covering false values 

Suppose we have a rule for determining a 7-credible region for t = T(9) based 
on the sampling model and prior, i.e., for each 7 £ [0, 1] and x £ X, the rule 
determines a region £? 7 (x) C T satisfying Hy(B 7 (x) \ x) > 7. The coverage 
Pg(T(8) £ B 1 (X)) of this region is then of considerable interest, particularly 
when II is taken to be a diffuse prior. In such a context it seems natural to 
ask that a 7-crediblc region satisfy, or at least approximately satisfy, the con- 
fidence property Pg(T(6) £ B 1 (X)) > 7 for all 9 £ 0. In other words, in 
an i.i.d. sequence x% ~ Pg for i = 1,2,..., we require that the proportion 
of times that T(8) £ B 1 (xi) is at least 7, and also that this property hold 
for all 8 £ O. It is well-known that Bayesian credible regions do not gener- 
ally possess this property, see Joshi (1974), and in fact can perform rather 
poorly in this regard. A number of papers discuss issues concerned with com- 
paring frequency and Bayesian inferences including, Bcrgcr and Selke (1987), 
Casella and Berger (1987), and Samaniego and Reneau (1994) as well as the 
texts Gclman, Carlin, Stern and Rubin (2004), Carlin and Louis (2000), and 
Robert (2001). 

We restrict to proper priors, as then, letting Em denote expectation with 
respect to the prior predictive distribution of the data, 



This can be interpreted as saying that the prior probability the 7-crcdible re- 
gion B 1 contains a value T(8), when 8 ~ n, is at least 7. This probability can 
also be given a long-run relative frequency interpretation in the i.i.d. sequence 
(8i,Xi) ~ II x Pg for i = 1, 2, . . . , as the proportion of times T(0j) £ B 1 {xi). 
Various arguments can be offered for the restriction to proper priors, e.g., see 
DcGroot (1970). In particular, when we have a proper prior, this long-run rel- 
ative frequency seems more appropriate than the confidence property, as the 
confidence property requires good coverage at values of 9 that have a priori very 



7 





I Bi{x) (T(9))P e (dx)U(d9) = Eu(Ps(T(9) £ B 7 (X))). (3) 
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little weight. Further, while the confidence property has its appeal, the plethora 
of absurd confidence regions, see Plantc (1991) for some discussion, might at 
least lead one to doubt the wisdom of focusing too closely on confidence. 

Property (3) holds for any 7-credible region B 1 and so does not help us choose 
among them. Consider, however, the accuracy of the region B 7 , where this is 
measured by the probability of _B 7 covering an independent false value r ~ A 
where A is a probability measure on T. 

Definition 1. The prior probability of covering a false value from probability 
measure A is given by 

E M (A(B^X)))= [ [ P e (r e B^X))A(dT)Tl(d6). (4) 

So the "true" value of 9 is generated from the prior n, the data x is generated 
from Pg, and the "false" value r of the parameter of interest is generated from 
A independent of the true value, i.e., r has no connection with the data. 

To obtain a 7-credible region B 1 that minimizes (4) we make use of the fol- 
lowing results. Suppose we have a probability measure P and a cr-finite measure 
Q on a set O. Further, suppose that P and Q are both absolutely continuous 
with respect to the same measure on £1 with respective densities p and q. Let 
D 1 = {ujo 6 £1 : P (p(ui)/q(co) > p(cJo) / q(u>o)) < 7} for 7 e [0, 1] . Lemma 1 and 
Theorem 2 are proved in Evans, Guttman and Swartz (2006). In that paper P 
is taken to be the posterior but otherwise the proofs are the same. 

Lemma 1. P(D 7 ) > 7 with equality whenever the distribution of p(uj) / q(u>) , 
with u! ~ P, has no atoms. 

Theorem 2. The set D 1 minimizes Q (D) among all measurable sets D C Cl 
satisfying P (D) > P (Dj) . Further, when the distribution ofp(-)/q(-) has no 
atoms, then D 7 minimizes Q (D) among all measurable sets D C f2 satisfying 
P (D) > 7. 

The following result establishes the optimality, with respect to (4), of hpd-like 
credible regions as defined in (5). 

Theorem 3. Suppose that the probability distribution A is also absolutely con- 
tinuous with respect to vq- on T with density X. Then, in the Bayesian model 
specified by II x Pg, the region B^ n given by 



< 7 (5) 



o / \ / f- rr xt ( K T (t\x) 7T T (T \ x) 

Ba,~/(x) = |t e T : n ^ ^-r > 

minimizes (4) among all regions B satisfying Hy(B(x) \ x) > Tlx(B\ n (x) \ x). 
IfHr{BA,-f(x) I x) = 7 for each x, then B\ n minimizes (4) among all 7- credible 
regions for T(9). 

Proof. Putting P = Hr(- \ x) and Q = A in Theorem 2, implies that B\. 7 (x) 
minimizes A(B(x)) among all B satisfying H-y(B(x) | x) > Hr(B\ n (x) | x). Now 
Em {A(B(X))) is minimized, among all regions B satisfying Ur(B(x)\x) > 
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IIt(-Ba j(x) I x) for each x, by B = B Al . Observe that E M (A(B(X))) = 
J e J x E A (I B{x] (T))P e (dx)U(de) = J e J x J T lB( x) (r) K{dr)Pe{dx)Ii{dO) = 
Se It fx ^(x) (r) Pe(dx) A(dr) U(d9) = J e J T P e {r € B(X)) A{dr) U(d9) which 
is (4). This completes the proof. □ 

Specializing this to the choice A = IIx, we have the following result. 

Corollary 4. If A = IIx, then -Ba,7 = C 7 , i.e., the optimal region is the j- 
relative surprise region. 

This says that C 7 minimizes the prior probability of covering a false value 
of the parameter of interest, when the false value follows the marginal prior 
distribution IIx and is independent of the true value of the model parameter. 
It seems natural to take A = IIx as this distribution identifies the values of r 
that we consider a priori at least plausible. A repeated sampling interpretation 
of this is obtained by considering a sequence (6i,Xi,Ti), for i = 1,2,..., of 
independent values from the joint distribution II x Pg x Hx- Then Corollary 4 
says that, among all 7-credible regions B 1 for T(8) formed from II x Pg, a 7- 
relative surprise region for Y((9) minimizes the proportion of times the event 
Tj e B 1 (x i ) is true. Of course, the event T(0,) G B 1 (x i ) is also true at least 7 
of the time in this sequence. 

It is worth noting that Theorem 3 and Corollary 4 also hold when 9 ~ II* 
for any probability distribution II*, i.e., / e J T Pq(t E B^(X)) A(dr) U*(d6) is 
minimized among all 7-credible regions B 1 for T(8), constructed from IlxPg, by 
i?A,7- The proof is the same. So, for example, we could take II* to be degenerate 
at some value and C 7 would still be optimal. From a practical, point-of-view, 
however, the choices II* = II and A = IIx seem to be the most sensible, as (4) 
then has an interpretation as a prior probability. 

In addition (4), with A = IIx, would appear to have several uses. First we 
can quote this probability as a way of assessing the accuracy of C 7 with a given 
prior. If this probability is quite high, then we have a region with low accuracy. 
Also (4) can be used for experimental design purposes such as setting sample 
size. Consider the following example. 

Example 1 (Location normal). Suppose that x = (x\, . . . , x n ) is a sample 
from the N(9, 1) distribution and 9 ~ N(0,a 2 ) so the posterior distribution of 
9 is N((n + l/cr 2 ) _1 na;, (n + l/er 2 ) -1 ). The ratio of the posterior density to the 
prior density is, in this case, proportional to the likelihood exp {— n(9 — i) 2 /2} . 
Therefore, a 7-relative surprise interval for 6 is a likelihood interval and so takes 
the form C 1 {x) = x ± fc 7 (n, x, a 2 ) where fc 7 (n, x, a 2 ) > satisfies 

7 = $((n + l/ ( j 2 )- 1 / 2 .T/a 2 + (?i + l/a 2 ) 1/2 fc 7 (n,x, ( 7 2 )) 

- $((n + l/a 2 )- 1/2 x/a 2 - (n + l/a 2 )^ 2 k 7 (n, x, a 2 )). (6) 

The value fc 7 (n, x, a 2 ) is easily obtained numerically from (6). Now IIx {C-f(x)) = 
$ ((5 + fey(n, x,a 2 ))/cr) - $ ((x - fc 7 (n, x, <t 2 ))/ct) and x ~ N(0, a 2 + l/n). 
Therefore, when A = Hx, (4) is given by 



M. Evans and M. Shakhatreh/ Optimal properties of some Bayesian inferences 1273 



$(((a 2 + l/nf^z + fc 7 (n, (a 2 + l/nf^z, a 2 ))/a) 

- $((((7 2 + l/nf^z - fe 7 (n, (a 2 + l/nf^z, a 2 ))/a) 1 <p(z) dz. (7) 

For example, the following table gives some values of the prior probability of 
covering a false value when 7 = .95 and a 2 = 1, based on a Monte Carlo 
integration sample size of 10 3 , with the standard errors in parentheses. 



n 


1 


10 


25 


50 


e m (n T (c 7 pO)) 


.700 (.004) 


.322 (.004) 


.212 (.003) 


.152 (.002) 



It is straightforward to show that (7) converges to as n — > 00. So, by 
choosing n large enough, we can make (7) as small as we like and so control the 
error in our inference. If a 2 — > 00, so the prior is becoming more diffuse, the 
prior probability of covering a false value generated from the prior converges to 
0. This is exactly how we would want our region to behave, namely, the data 
become much more important in determining the inference as the prior becomes 



more diffuse. For example, C 7 (x) — > x ± n 



" 1/2 Z(i +7) /2 as a 2 



00. So for a very 



diffuse prior, C 7 (x) has a very small probability of covering an independently 
generated value from the prior. 

Example 1 illustrates that we can't use (4), with A = Hy, to compare priors. 
A more concentrated prior will give a higher value for (4) than one more diffuse, 
however, we have different regions C 1 {x) and different distributions for the false 
values under different priors. This emphasizes the importance of a careful choice 
of the prior so that unrealistic values of the parameter are excluded. 

Similar optimality results can be obtained for the ORS given by (1). For 
suppose we agree to reject the hypothesis Hq : T(8) = tq whenever the ORS is 
greater than 7. This is equivalent to rejecting Hq whenever tq £ C 7 (x). Now 
consider the class of tests specified by 7-credible regions i? 7 , so we reject Hq 
whenever tq £ B^(x). In this case, we want to find _B 7 maximizing 



/ / P e (T€B'(X))Ilr(dT)Il(d6\r(6) = T ). 



(8) 



This is the conditional prior probability, given that Hq is true, that we would 
reject the hypothesis specified by r, when r is a value independently generated 
from the prior. The quantity (8) is clearly analogous to power in the frequcntist 
context. Then, arguing as in Theorem 3 and Corollary 4, we have that (8) is 
maximized by C 7 (x) among all rejection regions with posterior content less than 
or equal to 1 — 7. Also, we can use (8) to determine a sample size so that the 
test based on Cf; has a prescribed value for this conditional prior probability. 



3. Change in belief and unbiasedness 



For C C T, BF c (x) = {II T (C | x)/(l -IL r (C\ x))}{II T (C)/(l - ^(C))}- 1 is 
the Bayes factor in favor of the true value of r being in C. If we let C shrink 
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nicely to To (as in Rudin (1974), p. 163, a sequence of Borel sets Ci shrinks nicely 
to a point To if there is an a > such that each Ci lies in an open ball B(tq, r^) 
centered at To and of radius ri > 0, then fi(Ci) > gih(B{to, rj)) for every i 
where fi is volume measure, and r,; — > as i — > oo), then BFc{x) converges 
to 7Ty(to I x)/irr(To) whenever these densities are continuous at To. So we can 
think of this quantity as an approximation to the Bayes factor associated with 
To and the ORS is a calibration of this value to determine if it is indeed small 
and thus evidence against To as a plausible value. 

The Bayes factor in favor of C is a measure of the change in our belief that C 
contains the true value from a priori to a posteriori. Perhaps a simpler measure 
of this change in belief is given by the following. 

Definition 2. The relative belief ratio of a subset C C T, is given by RBc(x) = 
U. r (C\x)/IL r (C). 

Again, as C shrinks nicely to {to}, RBc{x) converges to 7Tx(to \ x)/ttx(to). 
Note that BFc(x) = RBc(x)/RBc<=(x) and so BFc is not a function of RBc 
or conversely. They are measuring change in belief on different scales. Clearly 
the two will be approximately equal when RBc(x) w 1 and this will occur 
whenever C is "small" . 

Now consider a 7-relative surprise region C 1 (x) for r. From (2), and the fact 
that the function IIt (7Tt(t | x)/ttt(t) > k \ x) is right-continuous in k, there 
exists k-y(x) such that C 1 {x) = {t : 7ty(t | x)/ttt(t) > k 7 (x)}. From this we 
have the following property for relative surprise regions. We assume throughout 
the remainder of this section that ttj- (t) > for every t 6 T. 

Lemma 5. The relative surprise region C 7 (x) satisfies RBq ( x ){ x ) > k~,(x). 
Proof. We have that n x (C 7 (x) \ x) = J c 7t x (t | x) vt(oIt) which is clearly 
greater than k y (x) J c j, ttx(t) v-j-{(It) = k 1 (x)Ur(C J (x)) proving the result. 

□ 

So Lemma 5 says that the ratio of posterior to prior probabilities of C 1 {x) 
satisfies the same inequality that the respective densities do on this set. 
We have an important lower bound on BFq^^(x) and RBq^^(x). 

Lemma 6. The relative surprise region C 7 (x) satisfies BFc ( x ){ x ) > 1 an d 
RB c ^ x) {x) > 1. 

Proof. Clearly we have that C^(x) = {t : 7Tt(t | x)/ttx(t) < fc 7 (a:)} and, as in 
Lemma 5, this implies that RBc<=( x )(x) < fc 7 (x). Combining this with Lemma 5 
gives that BF Cl ( x) (x) > 1. Since BF c (x) > 1, then l/n T (C) > l/n T (C|x) 
and this implies that RBc ( x )( x ) > 1- D 

Accordingly the Bayes factor and the relative belief ratio always indicate an 
increase in belief in the set C~ i {x) from a priori to a posteriori. In particular, the 
posterior probability content of Cy(x) is always greater than its prior content. 
Note that, since BFc<=(x) = l/BFc(x), we have that BF C c^{x) < 1 and 
RBcc(x)(x) < 1 for a relative surprise region C 7 (x). 
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Note that the fact the Bayes factor and relative belief ratio are always greater 
than 1 for a relative surprise region, does not imply that relative surprise in- 
ferences never find evidence against a hypothesized value Ho : r = tq. For we 
assess Hq by computing (1), or equivalently from (2), computing 7* = inf{7 : 
to 6 C 7 (x)}. If 7* is large (near 1), then we have evidence against Hq. Alterna- 
tively, we could select an appropriate 7 and report C 7 (x) as our best choice of a 
7-credible region to contain the true value. If To ^ C 7 (x) , then we have evidence 
against H . 

Of course, there may be other credible regions with these properties. For 
example, hpd regions often have these properties, although there does not seem 
to be an easy general proof of this. In any case, the following shows that relative 
surprise regions are best from this point-of-view. 

Theorem 7. The set C 7 (x) has maximal Bayes factor and maximal relative be- 
lief ratio among all measurable sets C <ZT satisfying ITx(C | x) = IIx(C 7 (x) | x). 

Proof. From Theorem 2 we know that Ilx(C) is minimized, among all measur- 
able C satisfying n T (C* | x) > n T (C* 7 (x) | x), by taking C = C 7 (x). So II T (C) 
is also minimized by the same choice when we restrict to those C satisfying 
Uy(C I x) = Il-f (C 7 (x) I x). Since f(x) = (1 — x)/x is decreasing in x, the result 
follows for the Bayes factor and is obvious for the relative belief ratio. □ 

Theorem 7 is most relevant when there are a number of credible regions, 
including the relative surprise region C 7 (x) , with posterior content exactly equal 
to 7. Theorem 7 then says that C 7 (x) is the best choice among these regions 
from the point of view of the Bayes factor and the relative belief ratio, as it 
provides the largest increase in belief from a priori to a posteriori. We have the 
following immediate consequence. 

Corollary 8. Suppose that the true value of 6 is selected according to II. Then 
Em(BFb^(x)(X)) and Em{RBb 1 (x)(^)) are maximized, among credible re- 
gions £> 7 (x) satisfying IIx(-B 7 (x)|x) = IIx(C 7 (x)|x) for all x, by B 7 (x) = 
C 7 (x). 

This says that the prior mean Bayes factor and prior mean relative belief 
ratio are maximized, by C 7 (x). 

Consider the following example as an illustration. 

Example 2 (Probability of joint success). Suppose we observe x from a 
Binomial(n, £q), an independent y from a Binomial(n, 62), we put independent 
uniform priors on 0\ and 62 and we are interested in making inference about 
ip = 6 \(>2- This is the probability of simultaneous success from tossing two coins 
where the coins have probability of heads equal to Q\ and 62 , respectively 

Suppose we have n = 5 and observe x = 4 and y = 1. In the following table 
we give some 7-hpd intervals and 7- relative surprise (rs) intervals for ip. We see 
that these intervals are quite different. Also the relative surprise intervals always 
dominate the hpd intervals in the sense that the Bayes factor and relative belief 
ratio of the relative surprise interval are always greater than the corresponding 
quantities for the hpd interval, as proven generally in Theorem 7. The estimate 
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determined by the hpd approach is the mode and this is given by .122 while the 
LRSE is .186. While the hpd intervals, in this example, always have RB > 1 and 
BF > 1, other methods of forming the intervals do not necessarily give intervals 
with these properties. For example, if we took the left-tail of the posterior as 
a 7-credible interval for i[>, then the left-tail .4-credible interval has RB = .730 
and BF = .640. 



7 


hpd 


RB (hpd) 


BF (hpd) 


rs 


RB (rs) 


BF (rs) 


.95 


(.008, .447) 


1.25 


5.99 


(.028, .501) 


1.32 


7.35 


.75 


(.032, .293) 


1.47 


2.82 


(.071, .361) 


1.69 


3.35 


.50 


(.059, .216) 


1.57 


2.16 


(.110, .284) 


1.74 


2.48 


.25 


(.089, .163) 


1.63 


1.84 


(.119, .270) 


1.76 


2.36 



These results are somewhat typical for this context. For example, when n = 
20, a; = 19,y= 19, the .95-hpd interval is (.675, .962) with RB = 16.16, and BF 
= 305.40, while the .95-relative surprise interval is (.684, .990) with RB = 16.98, 
and BF = 322.59. In this case the posterior mode is .857 and the LRSE is .902. 

In frequcntist contexts, a confidence region is said to be unbiased, if the 
probability of the region containing a particular false value is always less than 
or equal to the probability of the region containing the true value. The following 
result shows that relative surprise regions are unbiased in a generalized sense. 

Theorem 9. For a relative surprise region, the prior probability of containing 
an independent value generated from the prior is always less than the prior 
probability of containing the true value, when it is generated from the prior. 

Proof. From Lemma 6 we have that II y (C 7 (x) \ x) > Ur (C 7 (x)) and so it follows 
that £ M (n T (C 7 (X))) < E M (U r (C 7 (X)\X))). By (4), E M (J1 T (C 7 (X))) is the 
prior probability of C 7 containing a false value while _Em(IIy(C 7 (X) | X))) = 
En{Pe(Y(6) e C 7 (X))) is the prior probability that C 7 (A) contains the true 
value T(0) when 6 ~ II, X ~ P g . □ 

4. Reparameterizations 

A basic principle of inference is that inferences about a parameter of interest 
should be invariant under reparameterizations, e.g., whatever rule we use to 
obtain a 7-credible region £? 7 for a parameter of interest r, the rule should yield 
the region ^i? 7 for any 1-1, sufficiently smooth, reparameterization ip = \&(t). 
Relative surprise inferences satisfy this principle. 

Suppose, however, that we insist on forming credible regions for parameters 
taking values in T by minimizing their A content, where A is also absolutely 
continuous with respect to v-r on T with density A. Let T be an open subset 
of R k and X>t,t denote the class of reparameterizations : T — > T that are 
1-1, onto, continuously diffcrcntiable and such that vp -1 is continuously diffcr- 
entiablc. Then, by Theorem 3, the 7-credible region for tfj = ^(r) that has 
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minimal A content is given by the hpd-like region 

!/ ^(t\x)J^(t) 
ib 6 T : 11 T , win 



* < 7 > > (9) 



where (r) is the Jacobian of the transformation ^ evaluated at r. 

Since £?* 7 (x) is a 7-credible region for ip = \&(r), then \I/ _1 .B* (x) is a 
7-credible region for r. Let [Ba, 7 (x)] = {^ _1 .B* (x) : $ G £>r,r} be the class 
of 7-credible regions for r that arise via reparameterizations, when using the 
measure A to construct the credible regions. Each of the regions in [Sa, 7 (x)] is 
a plausible candidate as a 7-credible region for the parameter of interest and it 
is not clear how we should choose among them. The following result provides 
an approach to this choice. 

Lemma 10. For 3/ G XV.r we have that ^~ 1 _B* 7 (x) = £>Ao*. 7 (x) and so 
[B An (x)} = {B A (x) : * G T> r ,r}. 

Proof. For A C T, we have that A o = A(*(A)) = A(r) 1/7- (dr) = 

/ A A(*(r)) J* (r) ^r(dr) and so the density of Ao* is A(*(r)) J* (r) . Therefore, 
by Theorem 3 and (9), the result follows. □ 

So it is equivalent to think of [£?a. 7 (x)] as containing all the 7-crcdible regions 
for r obtained by minimizing the A o <J/ content for some \& G X>t,t- This result 
says that choosing among the elements of [£?a, 7 (x)] is equivalent to choosing 
which measure Ao $ we should use to optimize with respect to. 

Now suppose that A is a probability measure and define \&a : T — > [0, l] fe so 
that \E , a(t) ~ Uniform ([0, l] fc ) when r ~ A, e.g., we can take \Pa to be the 
probability transform. Then, for probability measures Ai and A2, we can define 
*A!,a 2 : T -> T by * AliAa = * Al ° ^ and thus A i *Ax,A 2 = A 2 . Therefore, 
h ^Ai,a 2 e XV, r, we have that [B Al)7 (x)] = [S A2;7 (x)] . We say that A 2 is 
obtained via a smooth reparameterization from Ai when ^a 1 ,a 2 S X>t,t- We 
have the following result immediately from Corollary 4. 

Lemma 11. If A is a probability measure and \E f ^ 1 o 6 ^V.T, i/ien C 7 (x) G 
[Sa, 7 (x)]. 

Note that, when , 3>n T , , J'A are the respective probability transforms, A is 
continuous and positive and ttt is positive and continuous, then by the in- 
verse function theorem, we must have that = ty^ 1 o <]>n T G Or,T and 
Jy, ( r ) — 7r x(''")/A( x f , *(T)). Lemma 11 says that, in very general circumstances, 
when we choose to optimize with respect to a probability measure A on T, a 
relative surprise region is always available as an equivalent credible region under 
a reparameterization. 

As previously noted, when we consider choosing among the elements of 
[Sa, 7 (x)] we need only consider which measure A o v£/ is most appropriate. The- 
orem 3 says that choosing A o leads to a region that minimizes the prior 
probability of covering a false value r ~ A o ^. Unless there are good reasons to 
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do otherwise, the most appropriate weighting to apply to false values is given 
by the prior Ex = A o ty^ 1 o $ nT . This leads to a region that focuses on the 
parameter values that we believe are a priori important. For example, choosing 
a credible region that minimized the probability of covering false values that 
are well out of range where the prior placed most of its mass, would seem to be 
clearly inappropriate, as we presumably know a priori that these are unrealistic 
values. 

The following illustrates the need for a rule to select a credible region. 

Example 3 (All 7-credible intervals can arise from reparameterizations). Sup- 
pose that the posterior of t 6 R 1 is absolutely continuous, has finite sec- 
ond moment and 7Tx (t | x) > for every t £ R 1 . Let n > tq be such that 
^t((tq,ti) | x) = 7. Let A be a probability measure with density A(r) > 
for every r G R l . From Lemma 10, we have that the set of all A hpd-like 7- 
crcdible intervals for r obtained via reparameterizations, is the same as the set 
of all A o <]/ hpd-like 7-credible intervals for r as ^ ranges over all reparam- 
eterizations. Then, there is a constant fc 7 (x, VP) such that B^ ^^(x) = {r : 
7Tt(t I x)/A$(r) > fe 7 (x, Vl/)} where A* is the density of A o <f\ Clearly, there 
exists a "J such that A*(t) oc 7Ty(t | x)(1 + (r — (to + ti)/2) 2 ), i.e., the prob- 
ability measure with this density is a smooth reparameterization of A, and so 
B Ao ^ n (x) = (t ,ti) G [Ba, 7 (x)] . 

While the reparameterization in the example depends on the data, it is not 
clear generally how to rule out reparameterizations. With the relative suprise 
rule this is not an issue, because of invariance. 

So far we have restricted the discussion to probability measures A. Suppose, 
however, that A is a bounded measure on T. It is immediate, for any positive 
constant b, that Bi,a,j(x) = B\ tJ (x). So we can take b = 1/A(T) and simply 
treat A as a probability measure, as we get the same set of credible regions and 
Cry(x) G [Ba, 7 (x)] . Suppose now that A is an unbounded measure with density 
A with respect to vr- Further suppose that there is a sequence of bounded mea- 
sures A n with densities A„ with respect to vt, such that A„ — > A pointwise as 
n — > 00. For example, if A and vq- are volume measure on R k , then A = 1 and we 
can take A n to be {2im) k l 2 times a Nk(0, nl) density. Then, when the posterior 
distribution of 7Tx(t | x)/A(r) is continuous, we have that B\ ni7 (x) — ► B\, 7 (x) 
as n — > co, since liminf B\ n>J (x) = limsupi?A„,7(x) = B^{x) up to a set 
having posterior measure 0. If A and 7rx are positive and continuous, then we 
have that C 1 {x) G [BA n ,-y{x)} for each n. Therefore, Ba,-{(x) is approximated by 
Ba„ .7 (x) for large n and C 7 (x) is equivalent to this set under a reparameteriza- 
tion. Accordingly, we can think of C 7 (x) as being approximately equivalent to 
-Ba, 7 (x) under a reparameterization. 

5. Conclusions 

Relative surprise regions have been shown to minimize the prior probability 
of covering a false value from the prior. This prior probability can be seen to 
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serve as a measure of accuracy of a credible region and can be used for design 
purposes. Further, relative surprise regions have optimal properties with respect 
to the Bayes factor and relative belief ratio of the region. Finally, we have shown 
that relative surprise regions arise very naturally when we consider choosing 
among equivalent credible regions based on reparameterizations. 

The relevance of our results in a particular application depends on the prior. 
In our view this is no different than concerns about the relevance of our choice of 
a sampling model in a problem, i.e., if we make a poor choice, then any inferences 
drawn based on this model are at least suspect. Model checking methods, can 
increase our confidence, when the model passes, that our choice makes sense. 
Similarly, methods for checking for prior-data conflict, such as those discussed 
in Evans and Moshonov (2006, 2007), can increase our confidence that the prior 
we have chosen makes sense. When the model and prior pass such checks, then 
optimal inferences drawn from such ingredients have greater force. In particular, 
the repeated sampling interpretations based upon the prior, then seem much 
more appropriate to us than the common frcqucntist practice of looking for 
procedures that possess good properties uniformly over all values of the model 
parameter, i.e., even at values of the parameter that we believe a priori are not 
relevant. 
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